noisy Y

YData: Seminar on Text Data Science

This YData seminar provides an introduction to the analysis of text data. The focus is on simple but often powerful text processing techniques that do not require linguistic analyses, to gain familiarity with working with text data. Sources used in the seminar include novels, political speeches, scientific journals, online FAQ and discussion boards, Wikipedia, and consumer product reviews. Methodologies include scraping, wrangling, hashing, sorting, regressing, embedding, and probabilistic modeling. The course is based on the Python programming language within a cloud computing platform, and is paced to be accessible to students who have previously taken or are currently enrolled in YData (S&DS 123).

Calendar Spring 2019

Instructor: John Lafferty
ULA: Yi Chern Tan
Meeting time: Thurs 9:25-11:15, LC 208

Date Topic Notes Lab
Thu 1/17 Introduction & Course Overview Slides Demo (from YData)
Lab 01: Notebooks and Expressions in Python
Thu 1/24 Gutenberg books
Dictionaries and hashing
Lab 02: Project Gutenberg Books (1/2)
Thu 1/31 Gutenberg books
Regular expressions
Regex tutorial Lab 03: Project Gutenberg Books (2/2)
Thu 2/7 State of the Union Speeches
JSON, plotting
Lab 04: State of the Union (1/2)
Thu 2/14 State of the Union Speeches (2/2)
graphs and networks
Lab 05: State of the Union (2/2) (Binder version)
Thu 2/21 Scientific articles (1/2)
topic models, Counters
Topic models, Counter Lab 06: Abstracts of Scientific Articles (1/2) (Version 2)
Thu 2/28 Scientific articles and movies (2/2) Topic models, stemming and lemmatization Lab 07: Movie Plot Summaries (2/2)
Thu 3/7 Midterm Midterm exam (practice midterm is here)
Thu 3/28 Wikipedia and word embeddings Notes on embeddings, a tutorial Lab 08: Word embeddings (1/2)
Thu 4/4 Wikipedia and word embeddings t-SNE, tutorial Lab 09: Word embeddings (2/2)
Thu 4/11 Product reviews and sentiment analysis overview of sentiment analysis tf-idf Lab 10: Sentiment analysis for beer reviews (1/2)
Thu 4/18 Product reviews and sentiment analysis, logistic regression and k-NN classification   Lab 11: Sentiment analysis for beer reviews (2/2)